Classifying Cancers 



Reference to Material Presented in Appendix 

This patent application includes material comprising tables and data presented as 
Appendix A on CD-ROM. The one file on the accompanying CD-ROM is entitled 
AppendixA.xls (2,868 kb), which is a Microsoft Excel Worksheet. The CD-ROM was 
created on August 2, 2001. The format is IBM-PC. The operating system is MS-Windows 
98. The file on the CD-ROM is incorporated herein by reference. 

Background of the Invention 

Cancer is the second leading cause of death in the United States after cardiovascular 
disease (Boring et al Cancer J. Clin. 43:7, 1993; incorporated herein by reference). One in 
three Americans will develop cancer in his or her lifetime, and one of every four Americans 
will die of cancer. In order to better combat this deadly disease, efforts have recently focused 
on fine tuning the categorization of tumors; by categorizing cancers, physicians hope to better 
treat an individual's cancer by providing more effective treatments. Researchers and 
physicians have categorized cancers based on invasion, metastasis, gross pathology, 
microscopic pathology, imunohistochemical markers, and molecular markers. With the 
recent advances in gene chip technology, researchers are increasingly focusing on the 
categorization of tumors based on the expression of marker genes. 

The most common human cancers are malignant neoplasms of the skin (Hall et al J. 
Am. Acad. Dermatol 40:35-42, 1999; Weyers et al Cancer 86:288-299, 1999; each of which 
is incorporated herein by reference). The incidence of cutaneous melanoma is rising 
especially steeply, with minimal progress in non-surgical treatment of advanced disease 
(Byers et al Hematol Oncol Clin. North Am. 12:717-735, 1998; McMasters et al Ann. Surg. 
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Oncol. 6:467-475, 1999; each of which is incorporated herein by reference). Despite 
significant effort to identify independent predictors of melanoma outcome, no accepted 
histopathological, molecular, or immunohistochemical marker defines subsets of this 

4 

neoplasm (Weyers et al Cancer 86:288-299, 1999; Byers et al Hematol Oncol Clin. North 
Am. 12:717-735, 1998; each of which is incorporated herein by reference). Accordingly, 
though melanoma is thought to present with different "taxonomic" forms, these are 
considered part of a continuous spectrum rather than discrete entities (Weyers et al Cancer 
86:288-299, 1999; incorporated herein by reference). Improved characterization and 
understanding of this potentially deadly disease would be valuable. 



Summary of the Invention 

The present invention provides a system for diagnosing aggressive forms of malignant 
melanoma based on the expression of certain marker genes within a tumor sample. In one 
embodiment, expression levels are determined for one or more of the following genes: 
Wnt5a (Seq. ID No.: 1, 2, & 3), MART-1 (Seq. ID No.: 4 & 5), pirin (Seq. ID No.: 6 & 7), 
HADHB (Seq. ID No.: 8 & 9), CD63 (Seq. ID No.: 10 & 1 1), EDNRB (Seq. ID No.: 12 & 
13), PGAM1 (Seq. ID No.: 14 & 15), HXB (Seq. ID No.: 16 & 17), RXRA (Seq. ID No.: 
18 & 19), integrin lb (Seq. ID No.: 20 & 21), syndecan 4 (Seq. ID No.: 22 & 23), 
tropomyosin 1 (Seq. ID No.: 24 & 25), AXL (Seq. ID No.: 26 & 27), EphA2 (Seq. ID No.: 
28 & 29), GAP43 (Seq. ID. No.: 30 & 31), PFKL (Seq. ID No.: 32 & 33), synuclein a (Seq. 
ID No.: 34 & 35), annexin A2 (Seq. ID No.: 36 & 37), CD20 (Seq. ID No.: 38 & 39), and 
RAB2 (Seq. ID No.: 40 & 41). In certain preferred embodiments, expression of a plurality 
of these genes is detected. In particularly preferred embodiments, Wnt5a is one of the genes 
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whose expression is detected. According to the present invention, overexpression of Wnt5a 
in a tumor sample indicates a more aggressive form of the disease. 

The present invention also provides a system for selecting a treatment protocol for a 
patient diagnosed with malignant melanoma based on the expression pattern of certain 
5 marker genes in a tumor sample. For example, tumors overexpressing Wnt5a may be treated 
more aggressively or with specific agents such as inhibitors of Wnt5a expression. Inhibitors 
of Wnt5a activity include anti-sense agents, RNA inhibition agents, small molecule inhibitors 
/S of Wnt5a activity, gene therapy, etc. 

;ffj In another aspect, the present invention provides a system for identifying and then 

Jr.— 

jSO treating aggressive forms of malignant melanoma by administering inhibitors of Wnt5a 

W x activity to a subject. 

In another aspect, the present invention provides a system for identifying compounds 

Jrj useful in the treatment of cancer, particularly aggressive forms of malignant melanoma 

ty ; expressing Wnt5a. In the inventive method, a cell expressing Wnt5a is contacted with an 
1 5 agent being screened for activities useful in the treatment of cancer, such as decreasing or 
inhibiting Wnt5a expression and/or activity. The agent may be a polynucleotide, protein, 
peptide, natural product, small molecule, etc. The level of Wnt5a expression or activity may 
be assayed using any available technique, including but not limited to, Northern blot analysis, 
enzyme activity, expression of a reporter gene, etc. 

20 The present invention also provides kits useful in diagnosing or identifying cancers or 

more aggressive forms of cancer. The kits may be used to identify more aggressive forms of 
malignant melanoma. The kit may include a gene chip with nucleic acid sequences of genes 
of interest including Wnt5a, MART-1, pirin, HADHB, CD63, EDNRB, PGAM1, HXB, 
RXRA, integrin lb, syndecan 4, tropomyosin 1, AXL, EphA2, GAP43, PFKL, synuclein a, 
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annexin A2, CD20, and RAB2, or a subset thereof. The kit may also or alternatively include 
primers, enzymes, and reagents for identifying, amplifying, labeling, or sequencing nucleic 
acids. Same kits may also include reagents for purifying nucleic acids such as mRNA. 
Rather than detecting gene expression, the kit may be used to determine protein levels and 
therefore include antibodies directed against the proteins encoded by the genes, Wnt5a, 
MART-1, pirin, HADHB, CD63, EDNRB, PGAM1, HXB, RXRA, integrin lb, syndecan 4, 
tropomyosin 1, AXL, EphA2 5 GAP43, PFKL, synuclein a, annexin A2, CD20, and RAB2, or 
a subset thereof. 

Definitions 

"Animal": The term animal, as used herein, refers to humans as well as non-human 
animals, including, for example, mammals, birds, reptiles, amphibians, and fish. Preferred 
non-human animals are a mammals (e.g., a rodent, a mouse, a rat, a rabbit, a monkey, a dog, 
a cat, a primate, or a pig). An animal may be a transgenic animal. In certain embodiments, 
non-human animals may be laboratory animals, raised by humans in a controlled 
environment other than their natural habitat. 

"Antibody": The term antibody refers to an immunoglobulin, whether natural or 
wholly or partially synthetically produced. All derivatives thereof which maintain specific 
binding ability are also included in the term. The term also covers any protein having a 
binding domain which is homologous or largely homologous to an immunoglobulin binding 
domain. These proteins may be derived from natural sources, or partly or wholly 
synthetically produced. An antibody may be monoclonal or polyclonal. The antibody may 
be a member of any immunoglobulin class, including any of the human classes: IgG, IgM, 
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IgA, IgD, and IgE. The antibody may be a fragment of an antibody such as an Fab fragment 
or a recombinantly produced scFv fragment. 

"Cancer": Cancer refers to a malignant tumor (e.g., lung cancer) or growth of cells 
(e.g., leukemia). Cancers tend to be less differentiated than benign tumors, grow more 
rapidly, show infiltration, invasion and destruction, and may metastasize. Cancers include, 
but are not limited to, fibrosarcoma, myxosarcoma, angiosarcoma, leukemia, squamous cell 
carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, hepatocellular 
carcinoma, etc. 

"Effective amount": In general, the "effective amount" of an active agent refers to the 
amount necessary to elicit a desired biological response. As will be appreciated by those of 
ordinary skill in this art, the absolute amount of a Wnt5a inhibitor that is effective may vary 
depending on such factors as the desired biological endpoint, the agent to be delivered, the 
target tissue, etc. Those of ordinary skill in the art will further understand that an "effective 
amount" may be administered in a single dose, or may be achieved by administration of 
multiple doses. For example, in the case of anti-neoplastic agents, the effective amount may 
be the amount of agent needed to reduce the size of the primary tumor, to reduce the size of a 
secondary tumor, to reduce the number of metastases, to reduce the growth rate of a tumor, to 
reduce the ability of the primary tumor to metastasize, to increase life expectancy, etc. . 

"Marker gene": A "marker gene" may be any gene or gene product (e.g., protein, 
peptide, mRNA) that indicates a particular diseased or physiological state (e.g., carcinoma, 
normal, dysplasia) or indicates a particular cell type, tissue type, or origin. The expression or 
lack of expression of a marker gene may indicate a particular physiological or diseased state 
of a patient, organ, tissue, or cell. Preferably, the expression or lack of expression may be 
determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene 
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chip analysis, etc. In certain embodiments, the level of expression of a marker gene is 
quantifiable. 

"Peptide" or "protein": According to the present invention, a "peptide" or "protein" 
comprises a string of at least three amino acids linked together by peptide bonds. The terms 
5 "protein" and "peptide" may be used interchangeably. Peptide may refer to an individual 
peptide or a collection of peptides. Inventive peptides preferably contain only natural amino 
acids, although non-natural amino acids (i.e., compounds that do not occur in nature but that 
5- can be incorporated into a polypeptide chain; see, for example, 

ify http://www.ccoxaltech.edu/-dadgrp/Unnatstruct.gif^ which displays structures of non-natural 
; l'10 amino acids that have been successfully incorporated into functional ion channels) and/or 
^ ; amino acid analogs as are known in the art may alternatively be employed. Also, one or 
jjtj more of the amino acids in an inventive peptide may be modified, for example, by the 
if L j addition of a chemical entity such as a carbohydrate group, a phosphate group, a farnesyl 
jipii group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or 
1 5 other modification, etc. In a preferred embodiment, the modifications of the peptide lead to a 
more stable peptide (e.g., greater half-life in vivo). These modifications may include 
cyclization of the peptide, the incorporation of D-amino acids, etc. None of the modifications 
should substantially interfere with the desired biological activity of the peptide. 

"Polynucleotide" or "oligonucleotide": Polynucleotide or oligonucleotide refers to a 
20 polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides. 
The polymer may include natural nucleosides (/. e. , adenosine, thymidine, guanosine, 
cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), 
nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3- 
methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, 
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C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 
8-oxoadenosine ? 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine), chemically 
modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, 
modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose), or 
modified phosphate groups (e.g., phosphorothioates and 5'-N-phosphoramidite linkages). 

"Small molecule": As used herein, the term "small molecule" refers to organic 
compounds, whether naturally-occurring or artificially created (e.g., via chemical synthesis) 
that have relatively low molecular weight and that are not proteins, polypeptides, or nucleic 
acids. Typically, small molecules have a molecular weight of less than about 1500 g/mol. 
Also, small molecules typically have multiple carbon-carbon bonds. 

"Tumor": As used in the present application, "tumor" refers to an abnormal growth 
of cells. The growth of the cells of a tumor typically exceed the growth of normal tissue and 
tends to be uncoordinated. The tumor may be benign (e.g., lipoma, fibroma, myxoma, 
lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or 
malignant (e.g., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, 
adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, 
colon cancer, lung cancer, etc.). 



Brief Description of the Drawing 

Figure 1 shows the clustering of gene expression data. a. Hierarchical clustering 
dendrogram with the cluster of 19 melanomas at the center, b. MDS three-dimensional plot 
of all 31 cutaneous melanoma samples showing major clustering of 19 samples (blue, within 
cylinder), and remaining 12 samples (gold), c. A plot of the observed and expected number 
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of genes producing a given number of classification errors for a partition of the 3 1 
melanomas into two groups of 12 and 19. Red triangles, observed clusters; filled bars, 
randomly produced clusters, open circles, predicted results for randomly variable gene 
expression, d. Introduction of random gaussian noise followed by cuts from the top of the 
original tree (resulting in k clusters), to determine discrepant pairs after perturbation (see 
Supplementary Information in Examples). 

Figure 2 illustrates the identification of genes which discriminate melanoma clusters, 
a. MDS analysis ranking genes according to their impact on minimizing cluster volume and 
maximizing center-to center inter-cluster distance, b. Top 22 genes obtained by these 
criteria listed in order of decreasing weight (for a full list, see Supplementary Information in 
Examples). Right, data from cutaneous melanomas identified on the horizontal axis and 
sorted by cluster (described in Maniotis et al "Vascular channel. formation by human 
melanoma cells in vivo and in vitro: vasculogenic mimicry" Am. J. Pathol 155:739-752, 
1999; incorporated herein by reference). Left, data from uveal melanomas expressed as the 
ratio of highly invasive to less invasive. Red, high ratios; green, low ratios (intensity of 
saturation scaled according to the ratio). The three genes not scored in the uveal samples 
were not included in the print design of the cutaneous samples. 

Figure 3. Guiding gene cluster selection, a. Two-dimensional cluster analysis of 
cutaneous melanoma samples (horizontal axis) and genes (vertical axis, presented in 
segments), b-e. Data from a queried at regions corresponding to four two discriminators of 
the major cluster: MART-1 (b), CD63 (c), tropomyosin (d), and WNT5a (e). Note that these 
clusters include other genes from the discriminator list (bold). The major cluster of 19 
samples is visually apparent on the left of this display. The full list of gene names and 
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corresponding calculated ratio information is provided in the Supplementary Information in 
the Examples. 

Figure 4 shows the variation in biological properties of melanoma clusters, a-c. A 
representative member of the major melanoma cluster (UACC-1022). d-f. A sample falling 
outside of the major cluster (M93-047). The two groups differ in the ability to migrate into a 
scratch wound (a, d), contract collagen gels (b, e) and form tubular networks (c 5 f). Results 
of these and additional cell mobility/invasion assays are included in Table 1 . Tubular 
network formation (vasculogenic mimicry (Maniotis et al. "Vascular channel formation by 
human melanoma cells in vivo and in vitro: vasculogenic mimicry" Am. J. Pathol. 155:739- 
752, 1999; incorporated herein by reference), f) and collagen gel contraction (related to the 
patterning of vascular channels, e) were observed only outside the major cluster (Table 1). 

Figure 5 shows a Kaplan-Meier survival plot for a total of 15 cases, 10 from Group A 
and 5 from Group B. No statistically significant association between group and survival was 
found (/? = 0.135). 

Figure 6 shows the data obtained from the top 22 genes with Wnt5a at the top of the 
list. The figure also show a diagram of the Wnt5a and Wntl signaling pathways. 

Figure 7 shows the data from real time PCR analysis of three cell lines, one with low 
Wnt5a expression (which scored as having low expression in the gene chip analysis), one 
with high WntSa expression (which scored as having high expression in the gene chip 
analysis), and one with intermediate WntSa expression, an originally low scoring cell line 
which had been transfected with a vector designed to express WntSa, The patent and 
transfected cell line were also analyzed for WNT5 A protein abundance using Western blot 
analysis and immunohistochemical staining. 
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Figure 8 shows the dramatic changes in cell morphology and cytoskeletal 
organization upon transfection of the parental cell line with a vector driving WntSa 
expression. The parental ceil line is spindle shaped with few points of attachment to the 
culture plate and disorganized actin filaments. The transfectants are broader and flatter with 
many extensions and highly polarlized actin filaments. 

Figure 9 shows the results of experiments done to look at possible cross talk between 
the WntSa and Wntl pathways. Beta-catenin was localized to the cytoplasm indicating that 
the Wntl pathway is not active. The downstream target of WntSa, protein kinase C, was also 
observed to be phosphorylated, especially the mu and alpha/beta isoforms, indicating that the 
expected WntSa pathway is active. 

Figure 10 shows scratch assay and Boyden chamber assay results for the parent cell 
line as well as the transfected cell line. The results from these two standard assays show that 
increased cell movement and invasiveness correlate with increased WntSa expression. 

Figure 11 shows that the transition from low to high WntSa expression is not 
associated with increasing amounts of the G protein coupled receptor, frizzled 5 (fzd5). Also 
shown are results indicating that an antibody to fzd5 can attenuate or reverse the phenotype 
that increased WntSa would normally produce. 

Detailed Description of Certain Preferred Embodiments of the Invention 

The present invention provides systems for identifying and treating cancers based on 
the expression of marker genes in the cancer cells. In a particular embodiment, the cancer to 
be categorized is malignant melanoma. The invention allows for the identification of more 
aggressive forms of cancer and profiling the affected patient so that a proper treatment 
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regimen can be initiated. The present invention also provides for kits useful in practicing the 
inventive methods. 

Diagnosing and Identifying Forms of Cancer 

In diagnosing or identifying a particular cancer or tumor, a test sample containing at 
least one cell from the tumor is provided to obtain a genetic sample. The test sample may be 
obtained using any technique known in the art including biopsy, blood sample, sample of 
bodily fluid (e.g., urine, lymph, ascites, cerebral spinal fluid, pleural effusion, sputum, stool, 
tears, sweat, pus, etc. ), surgical excisions, needle biopsy, scraping, etc. From the test sample 
is obtained a genetic sample. The genetic sample comprises a nucleic acid, preferably RNA 
and/or DNA. For example, in determining the expression of marker genes one can obtain 
mRNA from the test sample, and the mRNA may be reverse transcribed into cDNA for 
further analysis. In another embodiment, the mRNA itself is used in determining the 
expression of marker genes. In some embodiments, the expressions level of a particular 
marker gene may be determined by determining the level/presence of a gene product (e.g., 
protein) thereby eliminating the need to obtain a genetic sample from the test sample. 

The test sample is preferably a sample representative of the tumor or cancer as a 
whole. Preferably there is enough of the test sample to obtain a large enough genetic sample 
to accurately and reliably determine the expression levels of marker genes of interest in the 
cancer or tumor. In certain embodiments, multiple samples may be taken from the same 
tumor in order to obtain a representative sampling of the tumor. 

A genetic sample may be obtained from the test sample using any techniques known 
in the art (Ausubel et al. Current Protocols in Molecular Biology (John Wiley & Sons, Inc., 
New York, 1999); Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, 
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Fritsch, and Maniatis (Cold Spring Harbor Laboratory Press: 1 989); Nucleic Acid 
Hybridization (B. D. Hames & S. J. Higgins eds. 1984); the treatise, Methods in Enzymology 
(Academic Press, Inc., N.Y.); each of which is incorporated herein by reference). The 
nucleic acid may be purified from whole cells using DNA or RNA purification techniques. 
The genetic sample may also be amplified using PCR or in vivo techniques requiring 
subcloning. In a preferred embodiment, the genetic sample is obtained by isolating mRNA 
from the cells of the test sample and reverse transcribing the RNA into DNA in order to 
create cDNA (Khan et al Biochem. Biophys. Acta 1423:17-28, 1999; incorporated herein by 
reference). 

Once a genetic sample has been obtained, it can be analyzed for the presence or 
absence of particular marker genes. The analysis may be performed using any techniques 
known in the art including, but not limited to, sequencing, PCR, RT-PCR, quantitative PCR, 
restriction fragment length polymorphism, hybridization techniques, Northern blot, 
microarray technology, DNA microarray technology, etc. In determining the expression level 
of a marker gene or genes in a genetic sample, the level of expression may be normalized by 
comparison to the expression of another gene such as a well known, well characterized gene 
or a housekeeping gene. 

The expression data from a particular marker gene or group of marker genes may be 
analyzed using statistical methods described below in the Examples in order to determine the 
phenotype or characteristic of a particular tumor or cancer. Methods used in classifying 
tumors based on gene expression data are described in Ben-Dor et al. J. Comput. Biol. 7(3 & 
4):559-584, 2000; incorporated herein by reference. The analyzed data may also be used to 
select/profile patients for a particular treatment protocol. 
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For example, the present invention demonstrates that marker gene WntSa is expressed 
at high levels in more aggressive forms of malignant melanomas. A patient with malignant 
melanoma may have the expression level of WntSa in the cells of his/her tumor determined in 
order to help determine the prognosis and/or treatment plan for his/her particular disease. 
The expression level of WntSa would preferably be one of several factors used in deciding 
the prognosis or treatment plan of a patient. Preferably a trained and fully licensed physician 
would be consulted in determining the patient's prognosis and treatment plan. A high level 
of expression of WntSa may indicate a worse prognosis and suggest a more aggressive 
treatment plan. The treatment plan may also include inhibitors of WntSa activity such as anti- 
sense agents and gene therapy directed against Wnt5a. Small molecule inhibitors of WntSa 
activity may also be used in the treatment plan as well as pharmaceuticals that inhibit the 
WntSa pathway either upstream or downstream of Wnt5a itself. 

Marker Genes 

The present invention provides several marker genes that correlate with particularly 
aggressive forms of malignant melanoma. These markers may also be useful in categorizing 
other tumors or cancers other than malignant melanoma. For example, inventive marker 
genes may be useful in categorizing other types of skin cancer. Preferred marker genes 
include WntSa, M ART- 1 , pirin, HADHB, CD63, ENDRB, PGAM1, HXB, RXRA, integrin 
bl, syndecan 4, tropomyosin 1, AXL, EphA2, GAP43, PFKL, synuclein a, annexin A2, 
CD20, and RAB2, and combinations thereof. Other potential marker genes are listed in the 
Examples below. Particular sets of marker genes may be defined using statistical methods as 
described in the Examples in order to decrease or increase the specificity or sensitivity of the 
set. For example, a particular set of marker genes highly specific of aggressive forms of 
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malignant melanoma may be less sensitive (i.e., a negative result may occur in the presence 
on an aggressive form of melanoma). 

Different subsets of marker genes may be developed that show optimal function with 
different races, ethnic groups, sexes, geographic groups, stages of disease, types of cancer, 
cell types, etc. Subsets of marker genes may also be developed to be sensitive to the effect of 
a particular therapeutic regimen on disease progression. 

One particularly useful marker gene in the diagnosis of aggressive form of malignant 
melanoma is Wnt5a. The Wnt genes make up a large family of highly conserved genes that 
have been studied extensively in development. The first member, int-1 was discovered as a 
common integration site of mouse mammary tumor virus (MMTV) in mammary epithelial 
adenocarcinomas (Nusse and Varmus Cell 69:1073-1087, 1992; incorporated herein by 
reference). Int-1 is highly homologous to the Drosophila developmental gene wingless that 
is involved in pattern formation. The combination of wingless and int-1 gives rise to the term 
Wnt. Homologues of Wnt genes have been isolated in Drosophila, Xenopus, chicken, mouse, 
and humans (Nusse iand Varmus Cell 69:1073-1087, 1992; incorporated herein by reference). 
In humans, there are nine Wnt genes known including Wnt 5 a (Clark et al. Genomics 18:249- 
260, 1993; Lejeune et al. Clin. Cancer Res. 1 :215-222, 1995; each of which is incorporated 
herein by reference). Wnt5a has been found to be up-regulated in lung, colon, and prostate 
carcinomas and melanomas (Iozzo et al Cancer Res. 55:3495-3499, 1995; incorporated 
herein by reference). 

The sequence of the mRNA of Homo sapiens wingless MMTV integration site 

family, member 5a {Wnt 5a) is shown below: 

1 attaattctg gctccacttg ttgctcggcc caggttgggg agaggacgga 

gggtggccgc 

61 agcgggttcc tgagtgaatt acccaggagg gactgagcac agcaccaact 
agagaggggt 
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121 cagggggtgc gggactcgag 
gggctttgac 

181 tcaacagaat tgagacacgt 
atcccagcga 
5 241 aaatcagatt tcctggtgag 

aactgcctat 

301 atcttgccat caaaaaactc 
aacttaagag 

361 acccccgatg ctcccctggt 
1 0 gggaataaac 

421 atcttttcct tcttccctct 
agttgctttg 

4 81 gggatggctg gaagtgcaat 
catatttttc 
15 541 tccttcgccc aggttgtaat 

gaataaccct 

6 01 gttcagatgt cagaagtata 
actggcagga 

jjfi 6 61 ctttctcaag gacagaagaa 

j{?0 gtacatcgga 

,; !L 7 21 gaaggcgcga agacaggcat 

*| acggtggaac 

7 81 tgcagcactg tggataacac 
' cagccgcgag 

! ^25 841 acggccttca catacgccgt 

ccgggcgtgc 

9 01 cgcgagggcg agctgtccac 
%^ ggacctgccg 

!Py 961 cgggactggc tctggggcgg 

£30 ctttgccaag 

|H ? 1021 gagttcgtgg acgcccgcga 

cgagagtgct 

1081 cgcatcctca tgaacctgca 
caacctggct 
35 1141 gatgtggcct gcaagtgcca 

atgctggctg 

12 01 cagctggcag acttccgcaa 
cagcgcggcg 

1261 gccatgcggc tcaacagccg 
40 caactcgccc 

1321 accacacaag acctggtcta 
caatgagagc 

13 81 accggctcgc tgggcacgca 
catggatggc 

45 1441 tgcgagctca tgtgctgcgg 

gacggagcgc 

1501 tgccactgca agttccactg 
ggagatcgtg 

1561 gaccagtttg tgtgcaagta 
50 ccaggacccg 

1621 cttatttata gaaagtacag 
ttttattttt 

1681 ccccaagaat tgcaaccgga 
ctctgtggtt 



cgagcaggaa ggaggcagcg cctggcacca 
ttgtaatcgc tggcgtgccc cgcgcacagg 
gttgcgtggg tggattaatt tggaaaaaga 
acggaggaga agcgcagtca atcaacagta 
ttaacttgta tgcttgaaaa ttatctgaga 
ccagaagtcc attggaatat taagcccagg 
gtcttccaag ttcttcctag tggctttggc 
tgaagccaat tcttggtggt cgctaggtat 
tattatagga gcacagcctc tctgcagcca 
actgtgccac ttgtatcagg accacatgca 
caaagaatgc cagtatcaat tccgacatcg 
ctctgttttt ggcagggtga tgcagatagg 
gagcgcagca ggggtggtga acgccatgag 
ctgcggctgc agccgcgccg cgcgccccaa 
ctgcggcgac aacatcgact atggctaccg 
gcgggagcgc atccacgcca agggctccta 
caacaacgag gccggccgca ggacggtgta 
tggggtgtcc ggctcatgta gcctgaagac 
ggtgggtgat gccctgaagg agaagtacga 
gggcaagttg gtacaggtca acagccgctt 
catcgacccc agccctgact actgcgtgcg 
gggccgcctg tgcaacaaga cgtcggaggg 
ccgtgggtac gaccagttca agaccgtgca 
gtgctgctac gtcaagtgca agaagtgcac 
gtgggtgcca cccagcactc agccccgctc 
tgattctggt ttttggtttt tagaaatatt 
accatttttt ttcctgttac catctaagaa 
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1741 tattattaat attataatta ttatttggca ataatggggg tgggaaccac 
gaaaaatatt 

1801 tattttgtgg atctttgaaa aggtaataca agacttcttt tggatagtat 
agaatgaagg 

5 1861 gggaaataac acatacccta acttagctgt gtgggacatg gtacacatcc 

agaaggtaaa 

1921 gaaatacatt ttctttttct caaatatgcc atcatatggg atgggtaggt 
tccagttgaa 

1981 agagggtggt agaaatctat tcacaattca gcttctatga ccaaaatgag 
10 ttgtaaattc 

2 041 tctggtgcaa gataaaaggt cttgggaaaa caaaacaaaa caaaacaaac 
ctcccttccc 

2101 cagcagggct gctagcttgc tttctgcatt ttcaaaatga taatttacaa 
tggaaggaca 

15 2161 agaatgtcat attctcaagg aaaaaaggta tatcacatgt ctcattctcc 

irj tcaaatattc 

ib jh 2221 catttgcaga cagaccgtca tattctaata gctcatgaaa tttgggcagc 

yij agggaggaaa 

m\ 2281 gtccccagaa attaaaaaat ttaaaactct tatgtcaaga tgttgatttg 

120 aagctgttat 

>' 2341 aagaattggg attccagatt tgtaaaaaga cccccaatga ttctggacac 

jjSj tagatttttt 

LS: 2401 gtttggggag gttggcttga acataaatga aatatcctgt attttcttag 

,J ggatacttgg 

%3-5 2461 ttagtaaatt ataatagtag aaataataca tgaatcccat tcacaggttt 

ctcagcccaa 

jgj 2 521 gcaacaaggt aattgcgtgc cattcagcac tgcaccagag cagacaacct 

jj* 8 ! atttgaggaa 

2581 aaacagtgaa atccaccttc ctcttcacac tgagccctct ctgattcctc 
]p^0 cgtgttgtga 

! f^ ! 2 641 tgtgatgctg gccacgtttc caaacggcag ctccactggg tcccctttgg 

ttgtaggaca 

2 701 ggaaatgaaa cattaggagc tctgcttgga aaacagttca ctacttaggg 
atttttgttt 

35 2 761 cctaaaactt ttattttgag gagcagtagt tttctatgtt ttaatgacag 

aacttggcta 

2821 atggaattca cagaggtgtt gcagcgtatc actgttatga tcctgtgttt 
agattatcca 

2881 ctcatgcttc tcctattgta ctgcaggtgt accttaaaac tgttcccagt 
40 gtacttgaac 

2 941 agttgcattt ataagggggg aaatgtggtt taatggtgcc tgatatctca 
aagtcttttg 

3001 tacataacat atatatatat atacatatat ataaatataa atataaatat 
atctcattgc 

45 3061 agccagtgat ttagatttac agcttactct ggggttatct ctctgtctag 

agcattgttg 

3121 tccttcactg cagtccagtt gggattattc caaaagtttt ttgagtcttg 
agcttgggct 

3181 gtggccccgc tgtgatcata ccctgagcac gacgaagcaa cctcgtttct 
50 gaggaagaag 

3241 cttgagttct gactcactga aatgcgtgtt gggttgaaga tatctttttt 
tcttttctgc 

3301 ctcacccctt tgtctccaac ctccatttct gttcactttg tggagagggc 
attacttgtt 
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3361 cgttatagac atggacgtta agagatattc aaaactcaga agcatcagca 
atgtttctct 

3421 tttcttagtt cattctgcag aatggaaacc catgcctatt agaaatgaca 
gtacttatta 

5 3481 attgagtccc taaggaatat tcagcccact acatagatag cttttttttt 

tttttttttt 

3541 ttttaataag gacacctctt tccaaacagg ccatcaaata tgttcttatc 
tcagacttac 

3601 gttgttttaa aagtttggaa agatacacat cttttcatac ccccccttag 
10 gaggttgggc 

3661 tttcatatca cctcagccaa ctgtggctct taatttattg cataatgata 
tccacatcag 

3721 ccaactgtgg ctctttaatt tattgcataa tgatattcac atcccctcag 
ttgcagtgaa 

15 3781 ttgtgagcaa aagatcttga aagcaaaaag cactaattag tttaaaatgt 

: „_, cacttttttg 

^r; 3841 gtttttatta tacaaaaacc atgaagtact ttttttattt gctaaatcag 

- attgttcctt 

jr! 3 901 tttagtgact catgtttatg aagagagttg agtttaacaa tcctagcttt 

jjfO taaaagaaac 

^ 3 961 tatttaatgt aaaatattct acatgtcatt cagatattat gtatatcttc 

l fr tagcctttat 

4021 tctgtacttt taatgtacat atttctgtct tgcgtgattt gtatatttca 
# ! ctggtttaaa 

^25 4 081 aaacaaacat cgaaaggctt attccaaatg gaag 



~r| The translated sequence of Wnt5a is as follows: 

^ MAGSAMSSKFFLVALAIFFSFAQVVIEANSWWSLGMNNPVQMSE 

VYIIGAQPLCSQLAGLSQGQKKLCHLYQDHMQYIGEGAKTGIKECQYQFRHRRWNCST 
30 VDNTSVFGRVMQIGSRETAFTYAVSAAGVVNAMSRACREGELSTCGCSRAARPKDLPR 
DWLWGGCGDNIDYGYRFAKEFVDARERERIHAKGSYESARILMNLHNNEAGRRTVYNL 
ADVACKCHGVSGSCSLKTCWLQLADFRKVGDALKEKYDSAAAMRLNSRGKLVQVNSRF 
NSPTTQDLVYIDPSPDYCVRNESTGSLGTQGRLCNKTSEGMDGCELMCCGRGYDQFKT 
VQTERCHCKFHWCCYVKCKKCTEIVDQFVCK (Seq. ID No.: 3) 

35 

Other sequences homologous to the above sequences may also be used in the present 
invention. Preferably the sequence is at least 70% identical to the human Wnt5a DNA and 
protein sequences listed above. More preferably the sequence is at least 80%, 90%, 95%, 
97%, 98%, 99%, or >99% identical. A homolog of Wnt5a may also be identified by its 
40 activity. In another preferred embodiment, the homolog of Wnt5a is identified by its location 
in the genome {e.g. , location on the chromosome). 
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The present invention also provides a novel method of identifying compounds useful 
in the treatment of patients with cancer. In certain embodiments, the cancer is malignant 
melanoma. In other embodiments, the cancer is a malignant melanoma expressing Wnt5a. In 
particular, the inventive method identifies compounds directed against Wnt5a or Wnt5a 
activity specifically, or more generally, against downstream or upstream signals in the Wnt5a 
pathway. 

Any compound, moiety, or entity can be screened for activity against Wnt5a 
according to the present invention. For example, polynucleotides, peptides, proteins, natural 
products, chemical compounds, small molecules, polymers, biomolecules, etc. may be tested. 
The agents to be screened may be prepared by purification or synthesis, or may be obtained 
from commercial or other stock sources. 

The assay used to screen the agents may be an in vitro or in vivo assay. For example, 
an in vitro assay may utilize purified or partially purified WNT5A protein. The WNT5 A 
protein may be obtained by purifying the protein from a natural source or from a cell, such as 
bacteria, mammalian cells, yeast, or fungi, overexpressing WNT5A. Methods for 
overexpressing and purifying the proteins encoded by cloned genes are well known in the art 
(see, Ausubel et al. Current Protocols in Molecular Biology (John Wiley & Sons, Inc., New 
York, 1999); Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch, 
and Maniatis (Cold Spring Harbor Laboratory Press: 1989; each of which is incorporated 
herein by reference). Agents may be screened for their ability to bind the WNT5A protein or 
to enhance or prevent an interaction between WNT5 A and another protein, peptide, 
polynucleotide, or chemical compound. Agents may also be screened for their ability to 
affect more downstream effects of WNT5 A. Agents may be screened using high-throughput 
techniques known in the arts. 
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In one embodiment of an in vivo assay, a cell expressing Wnt5a is contacted with an 
agent to be tested. The level of Wnt5a expression or activity is then determined using an 
assay known in the art. These assays may include but are not limited to Northern blot 
analysis, enzyme activity, quantitative PCR, Western blot analysis, etc. As would be 
appreciated by one of skill in this art, experiments designed to screen for agents directed 
against Wnt5a may include proper positive and/or negative controls. The experiment may 
also include testing a particular agent a several difference concentrations in the range of about 
1 nM to about 100 mM, preferably about 1 nM to about ImM, more preferably about 1 nM to 
about 100 nM. 

In one preferred embodiment, the cells used in the screening method are skin cells, 
more preferably malignant melanoma cells. In certain embodiments, the cells or cell line are 
genetically engineered to express Wnt5a. In certain embodiments, the cells are malignant 
melanoma cells that did not express Wnt5a naturally but have been genetically engineered to 
express Wnt5a. Preferred embodiments of such cells and cell lines are described below in the 
Examples. 

Inventive methods of detecting whether a compound inhibits Wnt5a may include an 
assay which assesses the ability of the cells to "chew through", digest, or migrate through 
extracellular matrix as described below in the Examples. Assays of this type may include, 
but are not limited to, the scratch assay, and the Boyden chamber assay. A cell that 
overexpresses Wnt5a may be able to digest or migrate through extracellular matrix in its 
search for media or nutrients. Agents that inhibit such a cell's ability to digest extracellular 
matrix and/or may be inhibiting the activity of Wnt5a may be useful in the treatment 
malignant melanoma expressing Wnt5a. In a preferred embodiment, the agent reduces the 
ability of the cell to digest or migrate through extracellular by at least about 50% when 
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compared to cell that were not contacted with the agent, more preferably by at least about 
75%, and most preferably by at least about 90%. 

In certain other embodiments, cell morphology or cytoskeletal organization may be 
used to assess the effect of an agent on cells expressing Wnt5a. The cells may be contacted 
with various concentrations of the agent with a control plate of cells contacted with no agent. 
The shape of the cells, number of attachments of each cell to the plate, and/or the 
organization of actin filaments may be assessed to determine the effect of the agent on the 
cells. In other embodiments, downstream signaling molecules in the Wnt5a pathway are 
analyzed to determine the effect of the added agent. In one embodiment, the phosphorylation 
of protein kinase C is used to determine the effect of the agent. 

In other embodiments, agents may be screened for their ability to inhibit or knock out 
the Wnt5a pathway as shown in Figure 6. In one embodiment, agents may be screened for 
their ability to block the binding of WNT5A to its receptor, frizzled 5. An agent able to block 
this binding interaction could possibly attenuate or reverse the phenotypes that increased 
WNT5A would normally produce, such as increased cell movement an invasiveness. 

These and other aspects of the present invention will be further appreciated upon 
consideration of the following Examples, which are intended to illustrate certain particular 
embodiments of the invention but are not intended to limit its scope, as defined by the claims. 

Examples 
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Example 1-MoIecular Classification of Cutaneous Malignant Melanoma by Gene 

Expression Profiling 

We have proposed that a discrete and previously unrecognizable cancer taxonomy can 
be identified by viewing the systematized data from gene expression experiments (Bittner et 
al. Nature 406:536-540, 3 August 2000; incorporated herein by reference). However, for 
melanoma, inherent or technically induced variation could obscure such a classification as its 
appearance is very similar between patient samples and, in contrast to haematologic cancers 
(Golub et ah "Molecular classification of cancer, class discovery and class prediction by gene 
expression monitoring" Science 286:531-537, 1999; Alizadeh et al. "Distinct types of diffuse 
large B-cell lymphoma identified by gene expression profiling" Nature 403:503-51 1, 2000; 
each of which is incorporated herein by reference), it has few known recurring genetic 
changes. To explore this question, we gathered expression profiles for 38 samples, including 
31 melanomas and 7 controls (Table 1). Total messenger RNA was isolated directly from 
melanoma biopsies or tumor cell cultures, prepared fluorescent complementary DNA from 
the message and hybridized them to a microarray containing probes for 8,150 cDNAs 
(representing 6,971 unique genes), obtaining quantitative and comparative measurements for 
each gene. 

The tumor cell mRNA was compared with a single reference probe, providing 
normalized measures of the expression of each gene in each sample relative to the standard. 
Analysis of the normalized expression across all genes between samples provided a measure 
of the overaU^ifference in expression pattern between samples. Similarly, the orthogonal 
analysis of linear covariance between pairs of genes across all samples provided a measure of 
the similarity of behavior of the genes studied. 
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Figure 1 shows the integration of several analytical methods to visualize the overall 
expression pattern relationships between cutaneous melanoma tumor samples. Using a 
matrix of Pearson correlation coefficients from the complete pair- wise comparison of all 
experiments (Bittner et al. "Data analysis and integration of steps and arrows" Nature Genet. 
22:213-215, 1999; incorporated herein by reference), the 31 melanoma experiments are 
displayed as a hierarchical clustering dendrogram (Khan et ah "Gene expression profiling of 
alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 58:5009-5013, 1998; 
Eisen et al. "Cluster analysis and display of genome-wide expression patterns" Proc. Natl 
Acad. Scl USA 95:14863-14868, 1998; each of which is incorporated herein by reference) 
and as a three-dimensional multidimensional scaling (MDS) plot (Khan et al "Gene 
expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 
58:5009-5013, 1998; Everitt, B. Applied Multivariant Data Analysis. (Oxford Univ. Press, 
New York, 1992); incorporated herein by reference). The MDS plot displays the position of 
each tumor sample in three-dimensional Euclidean space, with the distance between 
experimental samples reflecting their approximate degree of correlation (Khan et al. "Gene 
expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 
58:5009-5013, 1998; Everitt, B. Applied Multivariant Data Analysis. (Oxford Univ. Press, 
New York, 1992); incorporated herein by reference). The analysis included all genes 
meeting a minimum level of expression in each hybridization. We also employed a non- 
hierarchical clustering algorithm (termed cluster affinity search technique; CAST) (Ben-Dor 
et al. "Clustering gene expression patterns" J. Comput. Biol. 6:281-297, 1 999;incorporated 
herein by reference) to define experimental clusters. The resulting hierarchical dendrogram 
of the 31 melanoma samples (Fig. la) demonstrates that 19 samples are tightly clustered at the 
bottom of the dendrogram in the area of highest similarity. Likewise, the non-hierarchical 
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CAST algorithm identified the identical major cluster 19 melanomas. This cluster is also a 
compact, readily separable grouping based on its overall similarity of expression pattern 
viewed by MDS (Fig. lb). 

There is no single established method to estimate the significance of an observed 
5 degree of relationship obtained by cluster prediction techniques (Golub et al. "Molecular 
classification of cancer, class discovery and class prediction by gene expression monitoring" 
Science 286:531-537, 1999; Bittner et al. "Data analysis and integration of steps and arrows" 

%^ 

iff; Nature Genet. 22:213-215, 1999; each of which is incorporated herein by reference), 
jfv Accordingly, we used two independent approaches to test the validity of our cluster 

;:jj;0 prediction of the 19-element cluster. The first approach (Fig. lc) examines the power of 

ijjjpi 

;j individual genes to discriminate the major cluster of 19 from the remaining samples by 
g?j examining the frequency of strong classifier genes compared to the expected frequency of 
ify such genes if expression is randomly variable, and to the frequency of strong classifiers in 
H random partitions of the same samples into new groupings of 19 and 12 (Ben-Dor et al. 
1 5 "Class Discovery in Gene Expression Data" Proceeding RECOMB 2001, pp. 31-38, 200 1 ; 
incorporated herein by reference). The non-randomness of the cluster results is evident. 
Specifically, many genes have expression patterns that differ strongly between the initial 
sample clusters and thus serve as good classifiers (Fig. lc, red triangles). However, 
expression patterns are not readily found which classify the samples when they are grouped 
20 into random partitions of the same size (Fig. lc, blue lines). Accordingly, in randomly 
formed clusters, expression behavior is essentially indistinguishable from truly random 
behavior of genes relative to these clusters (Fig. lc, compare blue lines with open circles). 

The second approach we used to test the validity of the cluster predictions is based on 
evaluating cluster membership after introducing random perturbations to the data set. For 
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each sample, the log-ratio of each gene was perturbed by the introduction of random gaussian 
noise with the mean equal to 0 and the standard deviation equal to 0.15 (an estimate of 
variation derived by computing the median standard deviation of the log-ratios for single 
genes across all 3 1 samples). Hierarchical clustering was then performed on the perturbed 
5 data set and a comparison made between the original tree (Fig. la) and the perturbed tree. 
Comparisons involved cutting the original and perturbed trees into k clusters followed by 
computing the proportion of paired samples clustering together in the original tree that did 
/t not cluster together in the perturbed tree (we refer to this measure as a weighted proportion of 
jjjj, discrepant pairs because it gives more weight to larger clusters). The comparison was 
00 repeated over multiple perturbed data sets for each possible cut in the original tree (k = 2, 3, 
« 30). For a given k, the weighted proportion of discrepant pairs was then averaged over the 

ft ! perturbed data sets resulting in the identification of weighted average discrepant pairs 
(WADP*; see Supplementary Information). 

!$BSB J 

Clusters that result from cutting the original tree into 9 or fewer groups are very 
15 reproducible (Fig. Id). It is noteworthy that the rise in WADP* almost exactly coincides with 
the division of the major 19-element cluster into smaller sub-clusters. These results strongly 
support the view that the major cluster of melanoma samples identified in this study 
represents a bona fide and highly reproducible grouping. 

We then performed statistical tests to determine whether any clinical or tumour cell 
20 characteristics were specifically associated with the clustered group. Tests for associations 
between the major cluster of 19 samples and the remaining 12 melanoma samples were 
performed for several in vivo variables, including sex, age, biopsy site, Breslow thickness, 
Clark's level and survival. There was no statistically significant association between the 
cluster group and any clinical variable. There were also no significant associations with the 
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in vitro variables, including pi 6 or P-catenin mutation status, in vitro pigmentation and cell 
passage number (see Supplementary Information). 

We included two pairs of specimens derived from the same patient in this sample set. 
These are M92-001 and M93-007 (two different samples from the same individual, surgically 
removed one year apart), and TD-1376-3 and TC-1 376-3 (the biopsy sample and a cell 
culture of the same tumour carried three passages in vitro). Although there was no 
significant association between cell passage number and cluster group (P = 0.857, see 
Supplementary Information), the TD-1376-3/TC- 1376-3 pair were included to serve as 
another control for the effects of cell culture. Remarkably, of the 465 pairwise comparisons 
among the melanoma samples, the pairs TD-1376-3/TC- 1376-3 and M92-001/M93-007 are 
the second and third most highly correlated pairs of samples, with nearly identical correlation 
coefficients (Fig. lb). 

On the basis of the linear correlation of global gene expression in Fig. 1, Figs 2 and 3 
illustrate the approach we have used to guide 'gene cluster' interpretation empirically. Fig. 
2a depicts our statistical method for extracting a 'weighted list' of individual genes whose 
variarice^^ all ^perimentsxongctlyjef ines the bomid^y_ofa_gi yen sample 

cluster (for details see Supplementary Information). Fig. 2b displays the list of genes with 
the most power to define the major melanoma cluster of 19 samples (Fig. la and b) in rank 
order along the vertical axis. The samples are ordered along the horizontal axis by cluster 
inclusion, and data are presented graphically as coloured images with the colour saturation 
directly proportional to the magnitude of the measured gene expression ratio (brightest red, 
highest R/G ratio; black squares, R/G ratio = 1; brightest greens, lowest R/G ratio). The 
complete list of genes discriminating the major cluster is in the Supplementary Information. 
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The weighted gene list can also be used to guide analysis of the larger gene 
expression data set. Figure 3a displays all data from the cutaneous melanoma samples in this 
study as a coloured image with genes ordered along the vertical axis by similarity of 
expression pattern (after Eisen et al. "Cluster analysis and display of genome-wide 
expression patterns" Proc. Natl. Acad, ScL USA 95:14863-14868, 1998; incorporated herein 
by reference). However, rather than basing analysis of this large (>300 5 000 elements) data 
set entirely on visual selection, we used genes from the weighted list to index gene cluster 
selection. Figure 3b-e illustrates this approach using four genes from the 'weighted list 5 in 
Fig. 2b (MART-1, CD63, tropomyosin and WNT5A), to interrogate the entire gene 
expression data set represented in Fig. 3a. 



Table 1 Summary of melanoma cases by cluster designation 


Case no. 


Sex/Age 


Biopsy site 


Passage 


pl6 


Invasive 


Vasulogenic 


Gel 


Cell 


Scratch 








no. 


mutation 


abilityf 


mimicryj 


contraction§ 


motilit!! 


wound 








(Biopsy) 


status* 












Melanoma primary cluster 


















UACC-502 


M/69 


Cervical 
node 


3 


Deleted 


2.8 ±. 01% 




ND 


ND 


37 


M92-001 


F/43 


Ankle 


2 


Deleted 


3.0 ±0.5% 




ND 


76.8 ±2. 96 


22 


A-375 


F/54 


Skin 


ND 


Mutation 


2.8 ± 0.2% 




ND 


67.80 ± 4.40 


26 


M91-054# 


M/45 


Axill. lymph 
node 


3 


WT 


# 


U 


# 


ND 


30 


UACC-1256 


F/67 


Thigh 
femoral 
node 


9 


Deleted 


ND 


ND 


ND 


ND 


ND 


M93-007 


F/43 


Ankle 


3 


Deleted 


2.6 ±0.1% 






ND 


12 


UACC-091 


M/52 


Unk 


7 


Deleted 


2.1 ±0.2% 






ND 


11 


UACC-1273 


M/50 


Axill. lymph 
node 


16 


Mutation 


. 2.5 ±0.3% 






ND 


13 


TD-1730 


M/55 


Thyroid 
lobe 


Biopsy 


ND 


ND 


ND 


ND 


ND 


ND 


TD-1638 


M/49 


Paraspinous 


Biopsy 


ND 


ND 


ND 


ND 


ND 


ND ^ 


TD-1720 


M/29 


Shoulder 


Biopsy 


ND 


ND 


ND 


ND 


ND 


ND 


TD-1348 


M/44 


Axill. lymph 
node 


Biopsy 


ND 


ND 


ND 


ND 


ND 


ND 


UACC-1022 


F/53 


Chest wall 


13 


WT 


2.9 ±0.1% 






ND 


63 


TC-1376C 


M/30 


Distal ileum 


3 


ND 


ND 


ND 


ND 


ND 


21 


TD-1376C 


M/30 


Distal ileum 


Biopsy 


ND 


ND 


ND 


ND 


ND 


ND 


UACC-2534 


M/68 


Abdomen 


7 


Deleted 


3.2 ±0.02% 




ND 


ND 


7 


UACC-383 


M/69 


Thigh 
femoral 
node 


29 


Deleted 


2.3 ± 0.2% 




ND 


70.40 ±5.27 


35 


UACC-457 


FUkn 


Unk 


19 


WT 


3.1 ±0.2% 




ND 


12.80 ±0.05 


ND 


^UACC-3093 


M/75 


Axill. lymph 
node 


4 


WT 


ND 


ND 


ND 


40.30 ±2.00 


24 


Melanoma non-clustered 


















UACC-930 


F/35 


Sm. bowel 


4 


WT 


4.8 ± 0.3% 


± 




ND 


50 


M93-047 


F/75 


Axill. lymph 


3 


Mutation 


10.7 ±0.03% 


+ 


+ 


ND 


75 



node 
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Table 1 Summary of melanoma cases by cluster designation 



Case no. 


Sex/Age 


Biopsy site 


Passage 

no. 
(Biopsy) 


pl6 
mutation 
status* 


Invasive 
abilityt 


Vasulogenic 
mimicry^ 


Gel 
contraction§ 


Cell 
motilit!! 


Scratch 
wound 
<%)1f 


UACC-2973 


M/37 


Axill. lymph 
node 


5 


ND 


ND 


ND 


ND 


ND 


48 


UACC-903 


M/25 


Back 


14 


Deleted 


3.8 ± 0.3% 


+ 




ND 


91 


TC-F027 


M/30 


Rt. chest 
wall 


6 


ND 


ND 


ND 


ND 


ND 


91 


UACC-1097 


M/56 


Rectus 
muscle 


6 


Mutation 


ND 


ND 


ND 


ND 


34 


UACC-647** 


M/32 


Axill. node 


14 


WT 


3.8 ±0.1% 


+ 


± 


ND 


55 


UACC-1012 


M/54 


Neck 


3 


ND 


4.9 ±0.1% 


ND 


ND 


122.00 ± 
11.30 


54 


UACC-827 


F/32 


' Rt. breast 


16 


Mutation 


ND 


ND 


ND 


ND 


32 


WM1791C 


Unk 


Ukn 


52 


ND 


4.6 ± 0.3% 




ND 


141.00 ± 
11.40 


71 


HA-A 


F/Ukn 


Ukn ' 


19 


ND 


3.9 ±0.5% 




ND 


21 1.00 ± 
12.40 


62 


UACC-1529 


M/48 


Axill. lymph 
node 


13 


Mutation 


4.2 ± 0.5% 


+ 




ND 


ND 


Uveal melanoma samples 
CCM-1A Unk 


Primary 


25 


ND 


2.2 ±.01% 






ND 


ND 


C918 


F/60 


Primary 


15 


ND 


12.9 ±03% 


+ 


+ 


ND 


ND 


MUM-2C 


M 


Liver 
metastases 


8 


ND 


2.0 ±0.1% 






ND 


ND 


MUM-2B 


M 


Liver 
metastases 


8 


ND 


13.3 ±0.6% 


+ 


+ 


ND 


ND 


Control samples 



Nil. C (fibroblast); UACC-3149 (ovarian adenocarcinoma); MCF-10A (breast epithelium); CRL-1634 (fibroblast); SRS-3 (cell culture variant); SRS- 
5 (cell culture variant);RMS-13 (rhabdomyosarcoma) 



* Mutation status of indicated samples for p 16 obtained by sequencing. Deleted, homozygous. Supplementary Information includes the specific 
mutations in pl6 for each sample tested. Samples were also sequenced for 0-catenin. No example of p-catenin mutation was observed. 

t Ability to invade a defined basement matrix. P = 0.0055; t-test for two populations. 
JTube forming ability at 5 days in a three-dimensional matrigel matrix. 

§ Ability to contract floating collagen 1 gels at 5 days as compared to HT-1080 fibrosarcoma cells (Maniotis etal. "Vascular channel formation by 
human melanoma cells in vivo and in vitro: vasculogenic mimicry" Am. J, Pathol. 155:739-752, 1999; incorporated herein by reference) 
MMigration rates expressed in um per day. Mean from eight experiments ± s.d. (P = 0.0063; t-test for two populations). Rates below 100 urn per day 
completely segregates in the melanoma primary cluster. 

Ability to close in vitro scratch wound at 24 h. Photographs of the wound were measured and percentage wound closure determined (Silletti et al. 
"Autocrine motility factor and the extracellular matrix I. Coordinate regulation of melanome cell adhesion, spreading and migration involves focal 
contact reorganization" Int J. Cancer 76: 120-128, 1998; incorporated herein by reference) (P < 0.00002, t-test for two populations). 

# M91 -054 was the only sample that demonstrated a mixed phenotype in culture with both an epitheloid population and a more fibroblastic 
population. Vasculogenic mimicry and gel contraction were only observed in the epitheloid population. Scratch assay resulted in 30 % closure after 
24 h for both populations. 

O TC-1376 mRNA was isolated after short term (3 passage) culture of the biopsy sample from the patient TD-1376 allowing the effects of short term 

culture on the expression profile to be observed. 

** UACC-647 cells form extensive cord-like networks by 5 days. 



Finally, in parallel to our microarray analysis of cutaneous melanoma, we studied a 
series of uveal melanoma specimens characterized for properties related to metastasis, 
including invasive ability and vasculogenic mimicry in vitro (Maniotis et al "Vascular 
channel formation by human melanoma cells in vivo and in vitro: vasculogenic mimicry" 
Am. J. Pathol 155:739-752, 1999; incorporated herein by reference). These samples were 
hybridized pairwise, directly comparing highly invasive cells to their less invasive 
counterparts. We examined the pattern of gene expression in these phenotypically 
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characterized cells with respect to the weighted discriminator list (Fig. 2b) that defines the 
major cluster of 19 cutaneous melanomas. Strikingly, genes expressed in common in the 
highly invasive uveal melanoma cells (Fig. 2b, inset) were strongly anti-correlated with the 
same gene from the major cluster of cutaneous melanoma samples (Fig. 2b). This 
observation, coupled with the known biological function of genes within the weighted list, 
indicated that specimens assigned within the major cutaneous melanoma cluster (Fig. la, b) 
would ha ve reduced mot ility and re duced invas ive abilit y as the y have down-regulation of 
genes related to cell spreading or migration, including formation of foc al adhesions (Adams 
"Characterization of cell-matrix adhesion requirements for the formation of fascin 
microspikes" Mol Biol Cell 8:2345-2363, 1997; Scott et al "ppl25FAK in human 
melanocytes and melanoma: expression and phosphorylation" Exp. Cell Res. 219:197-203, 
1995; each of which is incorporated herein by reference). Specific genes with reduced 
expression in the major cluster included integrin Bl (Jannji et al. "Autocrine TGF-beta- 
regulated expression of adhesion receptors and integrin-linked kinase in HT-144 melanoma 
cells correlates with their metastic phenotype" Int. J. Cancer 83:255-262, 1999; Hieken et al. 
"Betal integrin expression in malignant melanoma predicts occult lymph node metastases" 
Surgery 1 18:669-673, 1995; each of which is incorporated herein by reference), integrin B3 
(Van Belle et al "Progression-related expression of beta3 integrin in melanomas and nevi" 
Hum. Pathol 30:562-567, 1999; incorporated herein by reference), integrin al (Hieken et al 
"Betal integrin expression in malignant melanoma predicts occult lymph node metastases" 
Surgery 1 18:669-673, 1995; incorporated herein by reference), syndecan 4 (Woods et al, 
"Syndecan-4 binding to the high affinity heparin-binding domain of fibronectin drives focal 
adhesion formation in fibroblasts" A rch. Biochem. Biophys. 374:66-72, 2000; incorporated 
herein by reference) and vinculin (Helige et al "Interrelation of motility, cytoskeltal 
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organization and gap junctional communication with invasiveness of melanocytic cells in 
vitro" Invasion Metastasis 17:26-41, 1997; incorporated herein by reference) (Figs 2 and 3; 
see Supplementary Information). In samples outside the major cluster increased expression 
of fibronectin is particularly interesting. With other reports (Maung et aL "Requirement for 
focal adhesion kinase in tumor cell adhesion" Oncogene 18:6824-6828, 1999; Silletti et aL 
"Autocrine motility factor and the extracellular matrix I. Coordinate regulation of melanome 
cell adhesion, spreading and migration involves focal contact reorganization" Int. J. Cancer 
76:120-128, 1998; each of which is incorporated herein by reference), this observation 
indicates that these cells are induced to secrete this pro-migratory molecule, consistent with 
an important role for focal contacts in modulating melanoma cell motility. 

We then directly tested the prediction from the array results that cell spreading and 
migration could be discordant between melanoma cluster groups. Cutaneous melanomas 
(assigned either in or out of the major cluster) were characterized using a series of cellular 
assays applied to test cell motility and invasiveness (Table 1, Fig. 4). Figure 4 illustrates the 
discordance of cutaneous melanoma samples within the major cluster and those outside this 
group. As predicted from the analysis of their gene expression patterns, melanomas within 
the major cluster had reduced motility (P = 0.0063), invasive ability (P = 0.0055) and 
vasculogenic mimicry in comparison with melanomas outside the major cluster (Table 1). 

The patient population in this study had a uniformly poor prognosis, and neither 
typical clinical factors (for example, age, sex, biopsy site) nor in vitro characteristics (for 
example, passage number) provide strong correlation with clinical outcome, or expression 
information (see Supplementary Information). In contrast, molecular classification of these 
tumors on the basis of gene expression (Fig. 1, Table 1) could identify a previously 
undetected subtype of this cancer. The analyses described here were not designed to address 
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the relationship of gene expression profile and clinical outcome in melanoma patients, and 
thus the clinical relevance of our observed subgrouping awaits further analysis. However, 
survival information was available on 15 patients, and the results, though not statistically 
significant, are of interest. Three deaths occurred out of 10 patients in the tight cluster of 19 
while 4 deaths occurred out of 5 patients in the remaining group (log-rank P- value = 0.135). 
Our results indicate melanoma will provide a unique opportunity to study a homogeneous 
group of patients to determine if gene expression patterns p redict prognosis or therapeutic 
response in settings where we cannot currently determine who is most at risk for rapid 
di sease pr ogression and death. — — " 



Finally, classification of melanoma on the basis of gene expression patterns is 
possible, despite the prevailing view that the 'taxonomy' of this disease falls in a continuous 
spectrum lacking discernible entities. Our data show that melanoma is a useful model to 
identify genes critical for aspects of the metastatic process, including tumour cell motility and 
the ability to form primitive tubular networks that may contribute to tumour perfusion. The 
extent to which melanoma samples can be clinically subdivided by expression patterns 
remains to be elucidated. However, our identification of genes 'weighted 5 for their ability to 
discriminate a subset of melanomas should provide a sound molecular basis for the dissection 
of other clinically relevant subsets of this tumur. 

Methods 
Samples 

Cultured cells were collected and mRNA isolated as described (Khan et ah "DNA Microarray 
technology: the anticipated impact on the study of human disease" Biochim, Biophys. Acta 
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1423:17-28, 1999; www.nhgri.nih.go v//DIR/microarray; each of which is incorporated herein 
by reference). Samples underwent a series of controls for quality of mRNA, labeling and 
hybridization, as well as sample integrity (including genotyping DNA from all samples with 
five dinucleotide markers from four different chromosomes to insure individuality). The 
entire coding sequence of the p 1 6 gene and exon 3 of the B-catenin genes was sequenced to 
assess the mutation status of all available samples (see Supplementary Information). The 
biopsy tumour specimens used in this study were obtained with Institutional Review Board 
approval and clinical information is provided in the Supplementary Information. Biopsies 
were debrided, dissected into small pieces and frozen in liquid nitrogen. Frozen specimens 
were immediately placed into TRIzol Reagent (Gibco BRL), homogenized and mRNA 
isolated as described (Khan et al "DNA Microarray Technology: The Anticipated Impact on 
the Study of Human Disease" Biochim. Biophys. Acta 1423:17-28, 1999; 
www.nhgri.nih.gov/DIR/microarray; each of which is incorporated herein by reference). 

Microarrays 

The 8,150 human cDNAs used in this study were obtained under a Cooperative Research and 
Development Agreement with Research Genetics and 6,912 were verified by sequence. This 
set of cDNAs is part of a larger collection (Khan et al. "Gene expression profiling of alveolar 
rhabdomyosarcoma with cDNA microarrays" Cancer Res. 58:5009-5013, 1998; Duggan et 
al. "Expression profiling using cDNA microarrays" Nature Genet. 21:10-14, 1999; 
www.nhgri.nih.gov/DIR/microarray; each of which is incorporated herein by reference). On 
the basis of the Unigene build of 9 March 2000 

(http://www.ncbi.nlm.nih.gov/UniGene/build.html), the 8,150 cDNAs represent 6,971 unique 
genes in this melanoma array. All clones were confirmed by resequencing if necessary. 
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Microarrays were hybridized, scanned and image analysis performed as described (Khan et 
al "Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays" 
Cancer Res. 58:5009-5013, 1998; Khan et al. "DNA Microarray technology: the anticipated 
impact on the study of human disease" Biochim. Biophys. Acta 1423:17-28, 1999; 
www.nhgri.nih.gov/DIR/microarray; each of which is incorporated herein by reference). The 
raw data from the microarray is shown in Appendix A, a Microsoft Excel Worksheet, which 
has been included on a CD-ROM submitted with this application and is incorporated herein 
by reference. 

Statistical methods 

Detailed information on all statistical methods is in the Supplementary Information. 
Agglomerative hierarchical clustering of the 3 1 melanomas on the basis of their gene 
expression profiles was performed as described (Khan et al "Gene expression profiling of 
alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 58:5009-5013, 1998; 
Bittner et al. "Data analysis and integration of steps and arrows" Nature Genet. 22:213-215, 
1 999; each of which is incorporated herein by reference), to investigate relationships between 
tumour samples. Average linkage was used, as well as a dissimilarity measure of one minus 
the Pearson correlation coefficient of log ratios. The cutoff employed to obtain the observed 
partitioning was 0.54. The MDS was performed using an implementation of MDS in the 
MATLAB package. A non-hierarchical clustering algorithm (Ben-Dor et aL "Clustering 
gene expression patterns" J. Comput. Biol 6:281-297, 1999; incorporated herein by 
reference) was used to define experimental clusters. This approach takes a graph theoretic 
approach, and makes no assumptions on the similarity function or the number of clusters 
sought. 
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To generate the weighted gene list, cluster compaction and separation were evaluated. 
For a given clustering result, n\ = 19 and rti = 12, the discriminative weight of each gene w = 
dB/(kjd w i + k2<i W 2 + a); where d& is the centre-to-centre distance (between cluster Euclidean 
distance), d w j is the average Euclidean distance among all sample pairs within cluster i, £, = 
tj/(ti + tj) for a total of /; sample pairs in cluster /, and a is a small constant (0.1 in our study) 
to prevent the zero denominator case (Fig. 2a). Genes may then be ranked on the basis of w. 

In vitro biological assays 

Floating collagen lattices were prepared and used to test selected cell lines for their 
ability to deform the gels as described (Maniotis et al "Vascular channel formation by 
human melanoma cells in vivo and in vitro: vasculogenic mimicry" Am. J. PathoL 155:739- 
752, 1999; Table 1 legend). Samples were also tested for their ability to migrate into an in 
vitro scratch wound as described (Tamura et al. "Inhibition of cell migration, spreading and 
focal adhesions by tumor suppressor PTEN" Science 280:1614-1617, 1998; incorporated 
herein by reference). Cells were stained with Giemsa, a digital micrograph of the region was 
prepared and the stained area as a percent of total area in the scraped and open sub-regions 
was estimated by a thresholding procedure using IPLabs Spectrum (Scanalytics, Vienna, 
Virginia) software. Results in Table 1 represent data from 24 h after plating on coverslips 
treated with fibronectin (FN; 10 jig ml" 1 ; Tamura et al "Inhibition of cell migration, 
spreading and focal adhesions by tumor suppressor PTEN" Science 280:1614-1617, 1998; 
incorporated herein by reference). 

Examples of tubular network formation (associated with vasculogenic mimicry) could 
be observed following seeding of cell lines onto three-dimensional gels of polymerized 
Matrigel or Type 1 collagen (Collaborative Biochemical) as described (Maniotis et al 
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"Vascular channel formation by human melanoma cells in vivo and in vitro: vasculogenic 
mimicry" Am, J. Pathol 155:739-752, 1999; Table 1). 

Table 1 lists results from high throughput screening for cell migration as the radial 
dispersion of cells from an initial confluent monolayer of 2,000 melanoma cells deposited 
within a 1.0 mm circular area on glass surfaces precoated with FN (100 (ig ml" 1 ; Berens et aL 
"The role of extracellular matrix in human astrocytoma migration and proliferation studied in 
a microliter scale assay" Clin. Exp. Metastasis 12:405-415, 1994; Giese et aL "Contrasting 
migratory response of astrocytoma cells to tenascin mediated by different integrins" J. Cell 
Set 109:2161-2168, 1996; each of which is incorporated herein by reference). 

Selected cell lines were tested for their ability to invade a defined basement 
membrane matrix. Tumor cells (1 x 10 5 ) were seeded into the upper wells of the membrane 
invasion culture system (MICS) chamber (Hendrix et aL "A simple quantiative assay for 
studing the invasive potential of high and low human metastatic variants" Cancer Lett 
38:137-147, 1987; incorporated herein by reference) onto collagen/laminin/gelatin-coated 
(Sigma) polycarbonate membranes containing 10-(im pores (Osmonics, Livermore, 
California) containing lx Mito+ Serum Extender (Becton Dickinson). After 24 h of 
incubation at 37°C, the cells that invaded each membrane were collected, stained and counted 
as described (Hendrix et aL "Role of intermediate filaments in migration, invasion and 
metastasis" Cancer Metastasis Rev. 15:507-525, 1996; incorporated herein by reference). 
Percent invasion was corrected for proliferation and calculated as (total number of invading 
cells/ total number of cells seeded) x 100. 



Supplement I - Statistical Methods for Clustering of Gene Expression Data and 
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Validation of Cluster Predictions 

OVERVIEW: 

To fully appreciate the expression patterns derived from large number of cDNA microarrays 
and their relationship between melanoma tumor samples, several statistical methods were 
integrated as follows, 

a. Multidimensional scaling (MDS) method was employed in order to visualize the 
similarity between samples, and a hierarchical clustering dendrogram was produced by an 
implementation of the average-linkage clustering algorithm, 

b. The clustering results were further verified by a non-hierarchical algorithm, CAST (Ben- 
Dor et al J. Comput. Biol. 6:281-297, 1999; incorporated herein by reference), 

c. In order to determine the tightness and the statistical significance of the clusters derived 
from various methods, two independent approaches were assembled to validate the 
prediction. One, WADP* method, is sensitivity analysis of the noise perturbation to the 
data set. The other one is based on comparing the discrimination power observed for 
genes in the data to that expected in random data. This is accomplished using TNoM 
scoring. 

d. After confirming the clustering result, each gene was weighted based on their 
discriminative ability for the clusters derived from previous method. 

In the following section, detailed descriptions of the methods listed in Steps 3 to 4 will be 
presented. For some of the more standard methods, such as MDS, average-linkage methods, 
and CAST, we refer readers to the literature (Ben-Dor et al. J. Comput. Biol. 6:281-297, 
1999; Eisen et al. Proc. Natl. Acad Sci. USA 95:14863-14868, 1998; Everitt Cluster Analysis 
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(London: Edward Arnold), 1993; each of which is incorporated herein by reference). Since 
not all genes were readily detectable by the array method, a subset of the total number of 
surveyed genes was analyzed in all cases. A set of 3613 genes was chosen for analysis. The 
genes were chosen by an empirically derived set of criteria requiring an average mean 
intensity above background of the least intense signal (Cy3 or Cy5) across all experiments 
>2000 arbitrary units, and an average spot size across all experiments of >30 pixels. To 
avoid distortions of the data resulting from ratios where the signal in one channel is large, and 
the signal in the other channel is undetectable, ratios higher than 50 or lower than 0.02 were 
truncated to 50 or 0.02 for these analyses. 

Description of the WADP* method for testing the validity of cluster predictions 

Hierarchical clustering of the 3 1 melanoma samples was performed, resulting in a 
dendrogram (Fig. lb). Although the dendrogram gives insights about the similarity and 
relatedness among samples, it does not indicate robustness to variability associated with the 
assay sampling, etc. In order to draw valid conclusions about the clustering structure present 
in the data, it is necessary to investigate how variability affects the results of the cluster 
analysis. To this end, we developed and implemented a method that determines the 
reproducibility of given levels of cluster structure within the dendrogram under the condition 
of added noise. The method is described below. 

First, cut the original dendrogram at a height that results in k clusters and let Nk denote 
the number of clusters containing 2 or more elements. Let M t represent the number of pairs of 
elements in the i th of the Nk clusters. Next, perturb the data by adding to every log-ratio of 
each sample an independent random deviate generated from the N(0,D) distribution. Cluster 
the perturbed data and cut the resulting dendrogram at a height that again results in k clusters. 
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For the M f pairs of elements in the /' original cluster, record the number of those pairs, D, 
that do not remain together in the clustering of the perturbed data. Next, calculate the overall 
discrepancy rate for the clustering: (D\ + D 2 + . . . + D Nk )I(M\ + M 2 + . . . + M N ). This 

overall discrepancy rate is a weighted average of the TV* cluster-specific discrepancy rates 
(i.e., Di/M h for i = 1, 2, . . ., N k ), with weights proportional to the number of pairs in individual 
clusters. Finally, repeat the calculations over many perturbations of the original data set and 
report the average overall discrepancy rate (termed the Weighted Average Discrepant Pairs 
for k clusters, or WADP*). The above procedure is repeated for all possible cuts of the 
original dendrogram and WADP* is plotted versus k. Minima of the WADP curve are 
interpreted as indicating reproducible levels of structure. 

The parameter a represents the noise standard deviation inherent to the system. As 
mentioned above, the noise is composed of — at the least — assay variability and sampling 
variability, a is unknown and must be estimated. The method we use for estimating a is to 
compute the variance of the log-ratio of each gene across all samples. We then use the 
median of the empirical distribution of these variances as an estimate of er 2 It may be more 
appropriate to use a smaller value (say the tenth percentile of the empirical distribution), if it 
were believed that a large percentage of genes present on the array were truly differentially 
expressed within the population of samples hybridized. 

Description of the TNoM method for the cluster significance based on random partition. 

Threshold number of misclassification, or TNoM score, is a simple threshold-based 
method that uses a given expression level, for a given gene, to predict the cluster label of a 
given test sample. In the present study, we have 3 1 samples form 2 groups. Therefore, we can 
label the samples by /, , / = 1 , . . . , m, where /, e {0,1} and m = 3 1 . For the kth gene, let (x h li) k 
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be its expression pattern (or ratios in this study) and corresponding cluster labels. A threshold 
function is defined as, 

j a, ifx < h 



where h is a threshold value, and a e {0,1 }. For a given h and a we can assign the label 
fh,a(xd t° ^e rth sample. The number of misclassifications entailed by this scheme is, 



/=i 

The TNoM score for the kth gene, is defined as the minimum error achieved over all 
possible choices of h and a, 



The minimization step is accomplished by exhaustively searching all 2(m+\) possibilities. 

To examine the significance of groups derived by clustering algorithm, we used three 
steps. First, we evaluated TNoM scores for all genes found in the data set. Then, the number 
of genes that have TNoM score less than or equal to s, for s = 0, . . 12 (where 12 is the 
maximum misclassifications any classification rule may commit) was listed. Next, we 
randomly assigned cluster labels to all samples to form two arbitrary groups of 19 and 12 
samples. The TNoM score was again evaluated for each gene. A list of the number of genes 
that have TNoM score less than or equal s was similarly obtained. We repeated this process 
50 times to observe random fluctuations and their range of scores. Finally, the expected 
number of genes resulting in s or fewer misclassifications under the assumption of perfect 
random gene expression patterns can be calculated (Ben-Dor et ah , submitted for 
publication). As expected, the value produced by the 50 random sampling is close to those 




otherwise 
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produced by the theoretical rigorous calculation. The significance of the suggested clusters is 
reflected in the overabundance of genes with low TNoM scores. More precisely, a 
meaningful partition will produce far more genes with low TNoM scores than a random one. 

Description of the weighting method based on gene's discriminative ability. 

The clustering algorithms described in the text produced one tightly bonded cluster of 
n\ = 19 samples, and we assume the rest of n 2 = 12 samples form another cluster. For a given 
two-cluster setting, a discriminative weight for each gene can be evaluated by, 

w = d B l (k\d W{ + k 2 d W2 + a) 

where d B is the center-to -center distance (between cluster Euclidean distance), d w . is the 
average Euclidean distance among all sample pairs, total of t\ and t 2 sample pairs for cluster 1 
and 2, respectively, and k x = t\/(ti+t 2 ), and k 2 = t 2 I (t\+t 2 ). a is a small constant (0. 1 in our 
study) to prevent zero denominator case. Genes may then be ranked on the basis of w. The 
equation for weight w is not only designed to evaluate discriminative ability for single gene, 
but also capable of evaluate discriminative ability for 2 or more genes together. If you do not 
assume the second group of samples to be a tight cluster you can drop the d W2 term. 

Supplement II - Statistical Analysis of Clinical and Culture Characteristics of 

Melanoma Clusters 

SUMMARY REPORT: 

Thirty-one tissue specimens were clustered using the Bioclust clustering algorithm 
(see text), resulting in one tight cluster of 19 specimens (Group A) and 12 specimens that 
showed no specific clustering pattern (Group B). Statistical tests were performed to 



39 of 59 



determine whether any clinical or tumor cell characteristics were specifically associated with 
cluster group. For categorical variables we created a contingency table and used Fisher's 
exact test to compute a p-value (the Chi-square test was not used because each table had at 
least one expected cell frequency less than 5). For continuous and ordered variables, we used 
the Wilcoxon two-sample (rank-sum) test, a non-parametric alternative to the two-sample / 
test. Tests were performed in S-plus 4.5 and StatXact 3.1. 



The two groups consisted of the following patient IDs: 



Group A 


Group B 


M93-007 M9 1-054 UACC091 UACC502 
UACC1256 UACC127 UACC253 M92-001 
UACC457 UACC383 UACC309 A-375 
UACC1022 TD 13 76-3 TD1683 TD1720 
TD1384 TD1730 TC1376-3 


HA-A UACC827 UACC1529 
UACC647 UACC930 M93-047 
UACC2837 TC-F027 WM1791C 
UACC1012 UACC1097 UACC903 



As noted in the text, two pairs of specimens in Group A were derived from the same 
patient. The two pairs are M93-007 & M92-001 and TD1376-3 & TC1376-3. In our 
analyses, we only considered the data for each of these patients once or, as specifically noted, 
entirely removed the specimens for these patients from the analysis. 

We first performed an analysis that included all specimen types (tissues and cell 
lines). We tested for associations between group and the following variables: sex, age, 
mutation status, biopsy site*, pigment, Breslow thickness, Clark level, and specimen type. 
There was no variable tested, which was shown to be associated with cluster group (at the 
0.05 significance level. 

Although there was not a statistically significant association between group and 
specimen type 0=0.106) it was noteworthy that all 5 tissue specimens were located in Group 

* Biopsy site was broken down into the following three categories: skin/external (including ankle, 
abdomen/chest, shoulder, breast, neck/forehead and back), internal (including chest wall, distal ileum, 
paraspinous, thyroid lobe, small bowel, rectus muscle and intra-abdominal), and lymph nodes (including 
'axillary, cervical and thigh femoral). 
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A. We therefore performed another analysis in which we only considered data from cell 
lines. In the analysis of cell lines, no variables were associated with cluster group at the 0.05 
significance level, although "age' 5 did have a marginal association (p=0.0812). Passage 
number was also tested in this analysis and had no association with group (p=0.8570). 

Next, we investigated for differences in survival between the two cluster groups. We 
used a measure of survival that indicated survival time from the date of biopsy. Four cases 
(including the previous two) had a biopsy date falling in 1998 and a known status (alive or 
dead) for which a specific date of death or last follow-up was unknown. In order to use these 
cases in the survival analysis, the survival/follow-up time in these cases was arbitrarily set to 
1 year if the biopsy date occurred prior to 7/1/98 or 0.5 years if the biopsy date occurred on or 
after 7/1/98. 

The data used in the survival analysis are shown in Figure 1. A total of 15 cases were 
included in the analysis, 10 from Group A and 5 from Group B. Survival/follow-up times 
were rounded to the nearest quarter year. A Kaplan-Meier survival plot was created and log- 
rank test performed. No statistically significant association between group and survival was 
found (p=0.135). 

The analyses performed resulted in no significant association with cluster group. 
However, this does not necessarily mean associations do not exist between the groups and the 
clinical and tumor characteristics tested. The power of the tests we performed is limited by 
the amount of data available for each variable. For example, only 6 specimens in Group A 
and 3 in Group B have information on Breslow thickness. Finding significant associations 
with so few data is unlikely. The power of the tests would increase with more complete data 
on the existing specimens and by the addition of new specimens to the data set. Such studies 
are underway in our laboratory. 
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ANALYSIS OF ALL SPECIMENS: 

Group A = specimens that cluster; Group B = others. 

Two pairs of specimens in Group A (M93-007/M92-001 & TD13 7 6-3 /TCI 3 76-3) were 
derived from the same patient. The clinical and tumor characteristics for each of these 
patients are only considered once in the below analyses. 

SEX - no statistically significant association with group 

Contingency table with Fisher's exact test 
A B 

F 4 4 p-value = 0.6754 

M 12 7 alternative hypothesis: two-sided 

AGE - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.1397 
data: x: age w/group = A , and y: age w/group = B 
Mann-Whitney Statistic: W = 102.0, n=15, m=10 
alternative hypothesis: two-sided 

MUTA TION ST A TVS - no statistically significant association with group 
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Contingency table with Fisher's exact test 
A B 

mutated 2 4 p- value = 0.1713 

deleted 6 1 alternative hypothesis: two-sided 
WT 4 2 



Contingency table with Fisher's exact test 
Combined mutated and deleted into one category. 
A B 

mut./del. 8 5 p-value = 1 

WT 4 2 alternative hypothesis: two-sided 



BIOPSY SITE - no statistically significant association with group 

Contingency table with Fisher's exact test 

A B 

skin/external 3 3 p-value = 0.8763 

internal 4 3 alt. hypothesis: two-sided 

LN 7 4 



PIGMENT - no statistically significant association with group 
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Wilcoxon rank-sum test: p-value = 0.2631 
Pigment Type: light=l ) med=2, dark=3 
(amelanotic = light; tan = med; pigmented = dark.) 
data: x: pig. type w/group = A , and y: pig. type w/group = B 
5 Mann- Whitney Statistic: W = 76.5, n=13, m=9 

alternative hypothesis: two-sided 

BRESLOW THICKNESS - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.2619 
data: x: thickness w/group = A , and y: thickness w/group = B 
Mann- Whitney Statistic: W = 14.0, n-6, m=3 
alternative hypothesis: two-sided 

CLARK LEVEL - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.4481 
20 Clark level: 11=2, 111=3, IV=4 

data: x: Clark level w/group = A , and y: Clark level w/group = B 
Mann- Whitney Statistic: W = 19.5, n=6, m=5 
alternative hypothesis: two-sided 
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For the below analysis, the two pairs of specimens in Group A derived from the same patient 
(M93-007/M92-001 & TD1376-3/TC1376-3) were removed. 



5 



M l 

i: ; 



SPECIMEN TYPE - no statistically significant association with group 



Contingency table with Fisher's exact test 

111 A B 

|b cell line 11 12 p- value = 0.106 

y ' tissue 4 0 alternative hypothesis: two-sided 
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ANALYSIS OF CELL CULTURES: 



Group A = specimens that cluster; Group B = others. 

A pair of cell lines in Group A (M93-007/M92-001) was derived from the same patient. The 
clinical and tumor characteristic for this patient is only considered once in the below 
analyses. 

SEX - no statistically significant association with group 

Contingency table with Fisher's exact test 

A B 
F 4 4 p-value = 1 

M 8 7 alternative hypothesis: two-sided 

A GE - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.0812 
data: x: age w/group = A , and y: age w/group = B 
Mann- Whitney Statistic: W = 80.0, n=l 1, m=10 
alternative hypothesis: two-sided 
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MUTATION STATUS - no statistically significant association with group 



Contingency table with Fisher's exact test 





A 


B 




mutated 


2 


4 


p-value = 0.1 713 


deleted 


6 


1 


alternative hypothesis: two-sided 


WT 


4 


2 





Contingency table with Fisher's exact test 
Combined mutated and deleted into one category. 
A B 

mut./del. 8 5 p-value = 1 

WT 4 2 alternative hypothesis: two-sided 



BIOPSY SITE - no statistically significant association with group 

Contingency table with Fisher's exact test 

A B 

skin/external 2 3 p-value = 0.7272 

internal 2 3 alt. hypothesis: two-sided 

LN 6 4 
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PIGMENT - no statistically significant association with group 



Wilcoxon rank-sum test: p-value = 0.4212 

Pigment Type: light^l, med=2, dark=3 

amelanotic = light; tan = med; pigmented = dark. 

data: x: pig. type w/group = A 5 and y: pig. type w/group = B 

Mann- Whitney Statistic: W = 50.5, n=9 ? m=9 

alternative hypothesis: two-sided 

BRESLOW THICKNESS - no statistically significant association with group 

Wilcoxon rank-sum test : p-value = 0.2000 

data: x: thickness w/group = A , and y: thickness w/group = B 

Mann- Whitney Statistic: W = 8.0, n=3, m=3 

alternative hypothesis: two-sided 



CLARK LEVEL - no statistically significant association with group 
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Wilcoxon rank-sum test: p-value = 0.6349 
Clark level: 11=2, 111=3, IV=4 

data: x: Clark level w/group = A , and y: Clark level w/group = B 
Mann- Whitney Statistic: W = 13.0, n=4, m=5 
5 alternative hypothesis: two-sided 



20 



For the below analysis, the pair of specimens derived from the same patient in Group A 
(M93-007/M92-001) was removed. 

PASSAGE NUMBER - no statistically significant association with group 



Ijfi Wilcoxon rank- sum test: p-value = 0.8570 

Passage # 's for established cell lines were set equal to 21. 
15 data: x: passage # w/group = A , and y: passage # w/group = B 

Mann- Whitney Statistic: W = 34.0, n=8, m=8 
alternative hypothesis: two-sided 



Contingency table with Fisher's exact test 





A 


B 


1-5 


3 


4 


6-10 


4 


2 


11-20 


4 


5 


>20 


1 


1 



p-value = 0.8695 

alternative hypothesis: two-sided 
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SURVIVAL ANALYSIS: 

Data used in the survival analysis: 





Pt.ID 


Group 


Status 


Time 




M93-007 


A 


0 


7 




M9 1-054 


A 


0 


7 




UACC091 


A 


0 


7 




UACC502 


A , 


1 


0.5 




UACC2534 


A 


1 


0.25 




TD1683 


A 


1 


1 




TD1720 


A 


0 


0.5 


;! y 


TD1348 


A 


0 


5 




TD1730 


A 


0 


0.5 




TCI 3 76-3 


A 


0 


3 




UACC827 


B 


1 


0.5 




UACC930 


B 


1 


2.25 




M93-047 


B 


0 


6 




TC-F027 


B 


1 


1 




UACC903 


B 


1 


0.25 



Status: 0 = alive, 1 = dead 
Time is in years. 



10 Example 2-Expression of WntSa in Cell Lines with Originally Low Level Expression 

Wnt5a scored very high out of all the marker genes analyzed in the ability to 
discriminate between highly invasive malignant melanoma and less invasive melanoma. 
Melanoma samples with high levels of Wnt5a expression were more aggressive tumors than 
those with lower levels of WntSa expression. Figure 6 shows the top 22 genes selected for 
1 5 their ability to classify highly invasive malignant melanoma from less invasive melanoma. 
WntSa is at the tope of the list of these marker genes. 
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Figure 6 also shows WntSa? s expected signaling pathway in contrast to the Wntl 
pathway. Wntl is known to be transforming; however, its proximal methods of signaling are 
very difference from those of WntSa. In some studies, researchers have observed that the two 
pathways seem to oppose each other in terms of downstream effects. In the Wnt5a pathway, 
the first transduction of the WntSa signal is accomplished through the interaction of WntSa 
with a G protein-coupled receptor, frizzled 5 (FZD5). The signal is subsequently transduced 
through the PLC/IP3/DAG/PKC pathways. The WntSa signal eventually leads to integrin 
interactions, cytoskeletal effects, and other cellular effects. 

Low level expression of WntSa in the cluster of 1 9 melanomas was verified by real 
time PCR. Data for the samples WM-1791C and UACC-1273 are shown in Figure 7. The 
real time PCR results show that there is much more WntSa transcript in cell line WM-1791C, 
which originally was scored as having high level expression of WntSa by gene chip analysis, 
than in UACC-1273, which was originally scored as having low level expression. Vectors 
used to express higher levels of Wnt5a in cells that normally express low levels were 
developed using standard techniques to see if the phenotype of less aggressive samples 
expressing low levels of WntSa could be changed. A derivative of UACC-1273, a 
transfectant 4-3, which had been transfected with this vector, shows an intermediate level of 
WntSa expression in the real time PCR analysis. The increase in WntSa expression carries 
over in WNT5 A protein abundance as shown by Western blot and by immunohistochemical 
staining (nuclei staining blue, WNTSA staining red) (Figure 7). 

In terms of morphology, cell lines with originally low levels of WntSa expression 
showed dramatic changes in morphology and cytoskeletal organization when stably 
transfected with a vector driving WntSa expression. The parental line, UACC-1273, is 
spindle shaped with few points of attachment to the culture plate and disorganized actin 
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filaments (Figure 8). The transfectants are broader and flatter with many extensions and 
highly polarized actin filaments. 

In order to determine whether there was cross talk between the WntSa and Wntl 
pathways, an assay looking at beta-catenin was used. When Wntl signaling is active, beta- 
5 catenin is localized to the nucleus. In Figure 9, antibody staining for beta-catenin shows that 
the beta-catenin is localized in the cytoplasm and not concentrated in the nucleus. Therefore, 
no cross talk between the two pathways seems to be occurring. 

Protein kinase C (PKC), a downstream target likely to be modulated by WntSa, was 
litj also looked at. WntSa modulates PKC activity by phosphorylation of some or all of the PKC 
,p0 isoforms and not by alteration of PKC transcript levels. As can be seen in Figure 9, increased 

a i 

fin phosphorylated PKC is produced in the transfectants expressing significant levels of the 
|:;; Wnt5a transcript, as expected. The isoforms must frequently phosphorylated are mu and 

• i 

X*; alpha/beta. This is further evidence that one is looking at the exptected WntSa pathway, 
y. PKC is one of the central hubs of signal transduction, and pathways leading to many types of 
15 cellular action incuding proliferation, cytoskeletal organization, and cell movement are 
known. 

Increased cell movement and invasiveness were also found to correlate with increased 
Wnt5a expression in a scratch assay and a Boyden chamber assay. Transfectants expressing 
increased levels of WntSa show increased competence in filling in open gaps on a cell culture 
20 dish when compared to cells of the parent cell line (Figure 10). Increased phosphorylated 
PKC was found to correlate with increasing cell invasiveness as measured by a standard test 
for invasiveness, the Boyden chamber assay. 

The first transduction of the WntSa signal is accomplished through interaction with a 
G protein coupled, seven transmembrane receptor, frizzled 5. The various cell lines tested 
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show varying native levels of fzd5 transcript. In the cell line, UACC-1273, the transition 
from low to high WntSa expression is not associated with increasing amounts of the receptor. 
The use of an antibody to fzd5 prevents it from responding to Wnt5a and thereby attenuates 
or reverses the phenotypes that increased Wnt5a would normally produce. This is shown in 
the decreased level of phosphorylated PKC upon treatment with the anti-fzd antibody and in 
the decreased invasiveness of WntSa transfectants treated with the ant-fzd antibody. 

Other Embodiments 

The foregoing has been a description of certain non-limiting preferred embodiments 
of the invention. Those of ordinary skill in the art will appreciate that various changes-and 
modifications to this description may be made without departing from the spirit or scope of 
the present invention, as defined in the following claims. 
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